Implementation Aspects Of Large Vocabulary Recognition Based On Intraword And Interword Phonetic Units
نویسندگان
چکیده
Most large vocabulary speech recognition systems essentially consist of a training algorithm and a recognition structure which is essentially a search for the best path through a rather large decoding network. Although the performance of the recognizer is crucially tied to the details of the training procedure, it is absolutely essential that the recognition structure be efficient in terms of computation and memory, and accurate in terms of actually determining the best path through the lattice, so that a wide range of training (subword unit creation) strategies can be efficiently evaluated in a reasonable time period. We have considered an architecture in which we incorporate several well known procedures (beam search, compiled network, etc.) with some new ideas (stacks of active network nodes, likelihood computation on demand, guided search, etc.) to implement a search procedure which maintains the accuracy of the full search but which can decode a single sentence in about one minute of computing time (about 20 times real time) on a vectorized, concurrent processor. The ways in which we have realized this significant computational reduction are described in this paper.
منابع مشابه
Spoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کاملThe use of syllable phonotactics for word hypothesization
A search technique incorporating the automatic modeling of lexical variability is introduced for medium or large-vocabulary speaker-independent speech recognition. Current state-of-art systems depend on being able to model the entire language based on acoustic features and the constraints of syntax or interword probabilities. These methods often fail in the presence of multiple speakers, new vo...
متن کاملAn Investigation of Subword Unit Representations for Spoken Document Retrieval
This study investigates the feasibility of using subword unit representations for spoken document retrieval as an alternative to using words generated by either keyword spotting or word recognition. Our investigation is motivated by the observation that word-based retrieval approaches face the problem of either having to know the keywords to search for a priori, or requiring a very large recogn...
متن کاملUsing Chi-Square Testing in Modeling Confusion Characteristics for Robust Phonetic Set Generation
A phonetic representation of a language is used to describe the corresponding pronunciation and synthesize the acoustic model of any vocabulary. In order to obtain better phonetic representation, context-dependent units are used to model co-articulation effects between phones and have been broadly in speech recognition. However, this representation generally increases the number of recognition ...
متن کاملGeneration of robust phonetic set and decision tree for Mandarin using chi-square testing
A phonetic representation of a language is used to describe the corresponding pronunciation and synthesize the acoustic model of any vocabulary. A phonetic representation with smaller phonetic units such as SAMPA-C for Mandarin Chinese and decision trees for parameter sharing are broadly applied to deal with the problem of large numbers of recognition units. However, the confusable phonetic rep...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1990